Generic rank-one corrections for value iteration in Markovian decision problems
نویسنده
چکیده
Given a linear iteration of the form x := F (x), we consider modified versions of the form x := F (x + γd), where d is a fixed direction, and γ is chosen to minimize the norm of the residual ‖x + γd − F (x + γd)‖. We propose ways to choose d so that the convergence rate of the modified iteration is governed by the subdominant eigenvalue of the original. In the special case where F relates to a Markovian decision problem, we obtain a new extrapolation method for value iteration. In particular, our method accelerates the Gauss-Seidel version of the value iteration method for discounted problems in the same way that MacQueen’s error bounds accelerate the standard version. Furthermore, our method applies equally well to Markov Renewal and undiscounted problems. 1 Research supported by NSF under Grant CCR-9103804. Thanks are due to David Castanon for stimulating discussions. 2 Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, Mass., 02139. 1
منابع مشابه
Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes
One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration operators that improve the performance of the value iteration for average reward MDP models. These operators are based on two important properties of Markovian ...
متن کاملA New Value Iteration Method for the Average Cost Dynamic Programming Problem∗
We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the st...
متن کاملAffine Monotonic and Risk-Sensitive Models in Dynamic Programming
In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. ...
متن کاملLIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗
We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...
متن کاملApplication of variational iteration method for solving singular two point boundary value problems
In this paper, He's highly prolic variational iteration method is applied ef-fectively for showing the existence, uniqueness and solving a class of singularsecond order two point boundary value problems. The process of nding solu-tion involves generation of a sequence of appropriate and approximate iterativesolution function equally likely to converge to the exact solution of the givenproblem w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Oper. Res. Lett.
دوره 17 شماره
صفحات -
تاریخ انتشار 1995